Overview

Dataset statistics

Number of variables30
Number of observations168120
Missing cells1686140
Missing cells (%)33.4%
Duplicate rows540
Duplicate rows (%)0.3%
Total size in memory91.5 MiB
Average record size in memory570.6 B

Variable types

NUM12
CAT12
UNSUPPORTED4
BOOL2

Reproduction

Analysis started2020-04-06 04:30:18.916567
Analysis finished2020-04-06 04:32:46.054261
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Dataset has 540 (0.3%) duplicate rows Duplicates
NASC has a high cardinality: 32641 distinct values High cardinality
DT_INTER has a high cardinality: 4487 distinct values High cardinality
DIAG_PRINC has a high cardinality: 908 distinct values High cardinality
DIAG_SECUN has a high cardinality: 696 distinct values High cardinality
DIAGSEC1 has a high cardinality: 600 distinct values High cardinality
DIAGSEC2 has a high cardinality: 112 distinct values High cardinality
VAL_SH is highly correlated with UTI_MES_TO and 1 other fieldsHigh Correlation
UTI_MES_TO is highly correlated with VAL_SHHigh Correlation
VAL_SP is highly correlated with VAL_SHHigh Correlation
CGC_HOSP has 59129 (35.2%) missing values Missing
DIAG_SECUN has 84081 (50.0%) missing values Missing
ETNIA has 36659 (21.8%) missing values Missing
DIAGSEC1 has 161680 (96.2%) missing values Missing
DIAGSEC2 has 167827 (99.8%) missing values Missing
DIAGSEC3 has 168052 (> 99.9%) missing values Missing
DIAGSEC4 has 168113 (> 99.9%) missing values Missing
DIAGSEC5 has 168119 (> 99.9%) missing values Missing
DIAGSEC6 has 168120 (100.0%) missing values Missing
DIAGSEC7 has 168120 (100.0%) missing values Missing
DIAGSEC8 has 168120 (100.0%) missing values Missing
DIAGSEC9 has 168120 (100.0%) missing values Missing
MUNIC_RES is highly skewed (γ1 = 26.43076403) Skewed
UTI_INT_TO is highly skewed (γ1 = 69.38404365) Skewed
NACIONAL is highly skewed (γ1 = 72.1097067) Skewed
NASC only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
DT_INTER only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
DIAGSEC6 is an unsupported type, check if it needs cleaning or further analysis Rejected
DIAGSEC7 is an unsupported type, check if it needs cleaning or further analysis Rejected
DIAGSEC8 is an unsupported type, check if it needs cleaning or further analysis Rejected
DIAGSEC9 is an unsupported type, check if it needs cleaning or further analysis Rejected
UTI_MES_TO has 143218 (85.2%) zeros Zeros
UTI_INT_TO has 167961 (99.9%) zeros Zeros
DIAS_PERM has 2756 (1.6%) zeros Zeros

Variables

CGC_HOSP
Real number (ℝ≥0)

MISSING
Distinct count107
Unique (%)0.1%
Missing59129
Missing (%)35.2%
Infinite0
Infinite (%)0.0%
Mean13925397833063.29
Minimum1431061000176.0
Maximum73976722000150.0
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum1.431061e+12
5-th percentile7.267476001e+12
Q11.3937131e+13
median1.393713101e+13
Q31.5170723e+13
95-th percentile1.5180714e+13
Maximum7.3976722e+13
Range7.2545661e+13
Interquartile range (IQR)1.233591999e+12

Descriptive statistics

Standard deviation2.785869391e+12
Coefficient of variation (CV)0.2000567183
Kurtosis16.92654596
Mean1.392539783e+13
Median Absolute Deviation (MAD)1.160732682e+12
Skewness-1.937242346
Sum1.517743035e+18
Variance7.761068265e+24
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.3937131e+13 20348 12.1%
 
1.5170723e+13 11069 6.6%
 
1.5180714e+13 9762 5.8%
 
1.393713101e+13 9570 5.7%
 
1.5178551e+13 8363 5.0%
 
1.3937131e+13 6100 3.6%
 
1.3937131e+13 5757 3.4%
 
1.5153745e+13 5613 3.3%
 
1.3937131e+13 5303 3.2%
 
1.393713101e+13 4026 2.4%
 
Other values (97) 23080 13.7%
 
(Missing) 59129 35.2%
 
ValueCountFrequency (%) 
1.431061e+12 2 < 0.1%
 
2.106150001e+12 1023 0.6%
 
2.466144e+12 2401 1.4%
 
2.762633e+12 2 < 0.1%
 
3.204913e+12 3 < 0.1%
 
ValueCountFrequency (%) 
7.3976722e+13 1 < 0.1%
 
6.3254312e+13 1 < 0.1%
 
6.3103048e+13 1 < 0.1%
 
6.3088645e+13 1 < 0.1%
 
6.1986402e+13 1 < 0.1%
 

CEP
Real number (ℝ≥0)

Distinct count18900
Unique (%)11.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean42162280.73229241
Minimum1001000
Maximum98880000
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum1001000
5-th percentile40230320
Q140715315
median41310220
Q342850000
95-th percentile47800114
Maximum98880000
Range97879000
Interquartile range (IQR)2134685

Descriptive statistics

Standard deviation2371603.988
Coefficient of variation (CV)0.05624942358
Kurtosis49.80638208
Mean42162280.73
Median Absolute Deviation (MAD)1702109.874
Skewness1.499763734
Sum7.088322637e+12
Variance5.624505477e+12
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1001000. 7466287.5 39805495. 40005495. 40010015. ... 56316195. 56317403. 56481602. 59262485. 98880000. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
43700000 7373 4.4%
 
42700000 4497 2.7%
 
44470000 3175 1.9%
 
44460000 2987 1.8%
 
41250000 2929 1.7%
 
40415000 2464 1.5%
 
42802580 2440 1.5%
 
40050410 2002 1.2%
 
42850000 1947 1.2%
 
42820000 1783 1.1%
 
Other values (18890) 136523 81.2%
 
ValueCountFrequency (%) 
1001000 1 < 0.1%
 
1310935 1 < 0.1%
 
1509970 1 < 0.1%
 
2325529 1 < 0.1%
 
3242020 1 < 0.1%
 
ValueCountFrequency (%) 
98880000 1 < 0.1%
 
96880970 1 < 0.1%
 
96880000 2 < 0.1%
 
96504182 1 < 0.1%
 
94960100 1 < 0.1%
 

MUNIC_RES
Real number (ℝ≥0)

SKEWED
Distinct count553
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean292482.9683737806
Minimum110080
Maximum530180
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum110080
5-th percentile290570
Q1292740
median292740
Q3292740
95-th percentile293070
Maximum530180
Range420100
Interquartile range (IQR)0

Descriptive statistics

Standard deviation3626.159064
Coefficient of variation (CV)0.0123978469
Kurtosis2273.198726
Mean292482.9684
Median Absolute Deviation (MAD)615.2774989
Skewness26.43076403
Sum4.917223664e+10
Variance13149029.56
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[110080. 150376.5 215415. 230525. 230770. ... 411745. 431875. 520135. 521447.5 530180. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
292740 111837 66.5%
 
293070 7452 4.4%
 
290570 6745 4.0%
 
291920 4980 3.0%
 
293320 3221 1.9%
 
291610 2998 1.8%
 
290650 2774 1.7%
 
291005 1994 1.2%
 
292100 1773 1.1%
 
292950 1430 0.9%
 
Other values (543) 22916 13.6%
 
ValueCountFrequency (%) 
110080 1 < 0.1%
 
120040 5 < 0.1%
 
130260 3 < 0.1%
 
140010 1 < 0.1%
 
150060 1 < 0.1%
 
ValueCountFrequency (%) 
530180 1 < 0.1%
 
530070 1 < 0.1%
 
530010 2 < 0.1%
 
521645 1 < 0.1%
 
521250 1 < 0.1%
 

NASC
Categorical

HIGH CARDINALITY
TYPE DATE
Distinct count32641
Unique (%)19.4%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
1993-04-16
 
125
1936-02-10
 
104
1920-10-12
 
97
1980-02-02
 
92
1963-05-01
 
66
Other values (32636)
167636
ValueCountFrequency (%) 
1993-04-16 125 0.1%
 
1936-02-10 104 0.1%
 
1920-10-12 97 0.1%
 
1980-02-02 92 0.1%
 
1963-05-01 66 < 0.1%
 
2011-12-07 56 < 0.1%
 
1980-07-11 52 < 0.1%
 
2008-01-18 51 < 0.1%
 
2006-05-28 50 < 0.1%
 
2010-01-01 48 < 0.1%
 
Other values (32631) 167379 99.6%
 

Length

Max length10
Mean length10
Min length10
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Dash_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Common 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

SEXO
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
1
90827
3
77293
ValueCountFrequency (%) 
1 90827 54.0%
 
3 77293 46.0%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

UTI_MES_TO
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS
Distinct count85
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.27984177968118
Minimum0
Maximum99
Zeros143218
Zeros (%)85.2%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile8
Maximum99
Range99
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.642449423
Coefficient of variation (CV)3.627361988
Kurtosis51.45973925
Mean1.27984178
Median Absolute Deviation (MAD)2.189940202
Skewness5.990889602
Sum215167
Variance21.55233664
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 2.5 3.5 4.5 ... 38.5 45.5 60.5 71.5 99. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 143218 85.2%
 
2 2993 1.8%
 
1 2823 1.7%
 
3 2500 1.5%
 
7 2026 1.2%
 
4 2022 1.2%
 
5 1700 1.0%
 
6 1578 0.9%
 
8 1059 0.6%
 
9 822 0.5%
 
Other values (75) 7379 4.4%
 
ValueCountFrequency (%) 
0 143218 85.2%
 
1 2823 1.7%
 
2 2993 1.8%
 
3 2500 1.5%
 
4 2022 1.2%
 
ValueCountFrequency (%) 
99 1 < 0.1%
 
96 2 < 0.1%
 
93 1 < 0.1%
 
92 3 < 0.1%
 
91 1 < 0.1%
 

UTI_INT_TO
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count37
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.010022602902688556
Minimum0
Maximum66
Zeros167961
Zeros (%)99.9%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum66
Range66
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.453869594
Coefficient of variation (CV)45.28460305
Kurtosis6362.349065
Mean0.0100226029
Median Absolute Deviation (MAD)0.02002624799
Skewness69.38404365
Sum1685
Variance0.2059976083
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 6.5 15.5 31.5 66. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 167961 99.9%
 
3 15 < 0.1%
 
6 15 < 0.1%
 
2 14 < 0.1%
 
1 11 < 0.1%
 
4 11 < 0.1%
 
5 10 < 0.1%
 
9 10 < 0.1%
 
10 8 < 0.1%
 
7 6 < 0.1%
 
Other values (27) 59 < 0.1%
 
ValueCountFrequency (%) 
0 167961 99.9%
 
1 11 < 0.1%
 
2 14 < 0.1%
 
3 15 < 0.1%
 
4 11 < 0.1%
 
ValueCountFrequency (%) 
66 1 < 0.1%
 
54 1 < 0.1%
 
42 1 < 0.1%
 
37 1 < 0.1%
 
36 1 < 0.1%
 

VAL_SH
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count51772
Unique (%)30.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1259.4719468831788
Minimum19.03
Maximum54514.88
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum19.03
5-th percentile160.62
Q1467.47
median544.07
Q3896.4125
95-th percentile4986.065
Maximum54514.88
Range54495.85
Interquartile range (IQR)428.9425

Descriptive statistics

Standard deviation2438.11638
Coefficient of variation (CV)1.935824284
Kurtosis55.2629372
Mean1259.471947
Median Absolute Deviation (MAD)1200.528299
Skewness6.042422989
Sum211742423.7
Variance5944411.484
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.9030000e+01 2.4750000e+01 3.0970000e+01 3.2405000e+01 3.3840000e+01 ... 2.0449205e+04 2.3521360e+04 3.0419135e+04 4.1608715e+04 5.4514880e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
504.07 12829 7.6%
 
512.07 3364 2.0%
 
528.07 3135 1.9%
 
520.07 3046 1.8%
 
453.48 2710 1.6%
 
536.07 2546 1.5%
 
451.47 2097 1.2%
 
480.07 2074 1.2%
 
544.07 1694 1.0%
 
241.31 1557 0.9%
 
Other values (51762) 133068 79.2%
 
ValueCountFrequency (%) 
19.03 5 < 0.1%
 
30.47 657 0.4%
 
31.47 38 < 0.1%
 
33.34 582 0.3%
 
34.34 16 < 0.1%
 
ValueCountFrequency (%) 
54514.88 1 < 0.1%
 
53127.45 1 < 0.1%
 
51369.08 1 < 0.1%
 
51199.15 1 < 0.1%
 
50812.25 1 < 0.1%
 

VAL_SP
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count11052
Unique (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean207.36150808945987
Minimum5.1
Maximum15383.27
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum5.1
5-th percentile25.71
Q155.69
median78.35
Q3173.29
95-th percentile852.6215
Maximum15383.27
Range15378.17
Interquartile range (IQR)117.6

Descriptive statistics

Standard deviation401.136689
Coefficient of variation (CV)1.934479994
Kurtosis85.73160157
Mean207.3615081
Median Absolute Deviation (MAD)206.6830184
Skewness6.527403945
Sum34861616.74
Variance160910.6432
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[5.100000e+00 5.210000e+00 5.330000e+00 5.585000e+00 6.405000e+00 ... 3.271180e+03 4.191710e+03 4.960975e+03 6.932780e+03 1.538327e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
78.35 47238 28.1%
 
25.71 8637 5.1%
 
26.51 7429 4.4%
 
29.4 5711 3.4%
 
183.91 5573 3.3%
 
74.62 4484 2.7%
 
117.52 4126 2.5%
 
24.1 1857 1.1%
 
44.1 1508 0.9%
 
9.91 1328 0.8%
 
Other values (11042) 80229 47.7%
 
ValueCountFrequency (%) 
5.1 5 < 0.1%
 
5.32 60 < 0.1%
 
5.34 1 < 0.1%
 
5.58 4 < 0.1%
 
5.59 224 0.1%
 
ValueCountFrequency (%) 
15383.27 1 < 0.1%
 
14247.6 1 < 0.1%
 
12311.11 1 < 0.1%
 
12148.16 1 < 0.1%
 
11945.28 1 < 0.1%
 

DT_INTER
Categorical

HIGH CARDINALITY
TYPE DATE
Distinct count4487
Unique (%)2.7%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
2008-01-01
 
215
2011-12-28
 
137
2012-12-01
 
126
2017-07-01
 
124
2017-05-01
 
119
Other values (4482)
167399
ValueCountFrequency (%) 
2008-01-01 215 0.1%
 
2011-12-28 137 0.1%
 
2012-12-01 126 0.1%
 
2017-07-01 124 0.1%
 
2017-05-01 119 0.1%
 
2017-06-01 118 0.1%
 
2019-06-01 116 0.1%
 
2014-08-01 115 0.1%
 
2016-09-01 112 0.1%
 
2015-05-01 107 0.1%
 
Other values (4477) 166831 99.2%
 

Length

Max length10
Mean length10
Min length10
ValueCountFrequency (%) 
Decimal_Number 10 90.9%
 
Dash_Punctuation 1 9.1%
 
ValueCountFrequency (%) 
Common 11 100.0%
 
ValueCountFrequency (%) 
ASCII 11 100.0%
 

DIAG_PRINC
Categorical

HIGH CARDINALITY
Distinct count908
Unique (%)0.5%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
J189
23922
J159
18419
J960
18136
J180
 
13980
J188
 
10166
Other values (903)
83497
ValueCountFrequency (%) 
J189 23922 14.2%
 
J159 18419 11.0%
 
J960 18136 10.8%
 
J180 13980 8.3%
 
J188 10166 6.0%
 
J459 7228 4.3%
 
J158 6832 4.1%
 
J353 5880 3.5%
 
J219 5413 3.2%
 
J449 3318 2.0%
 
Other values (898) 54826 32.6%
 

Length

Max length4
Mean length3.928414228
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 21 67.7%
 
Decimal_Number 10 32.3%
 
ValueCountFrequency (%) 
Latin 21 67.7%
 
Common 10 32.3%
 
ValueCountFrequency (%) 
ASCII 31 100.0%
 

DIAG_SECUN
Categorical

HIGH CARDINALITY
MISSING
Distinct count696
Unique (%)0.8%
Missing84081
Missing (%)50.0%
Memory size1.3 MiB
0000
70258
Y099
 
1599
J960
 
1276
R060
 
1254
J90
 
1059
Other values (691)
 
8593
ValueCountFrequency (%) 
0000 70258 41.8%
 
Y099 1599 1.0%
 
J960 1276 0.8%
 
R060 1254 0.7%
 
J90 1059 0.6%
 
A150 831 0.5%
 
J189 606 0.4%
 
J188 601 0.4%
 
Y86 449 0.3%
 
J450 448 0.3%
 
Other values (686) 5658 3.4%
 
(Missing) 84081 50.0%
 

Length

Max length4
Mean length3.486551273
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 25 67.6%
 
Decimal_Number 10 27.0%
 
Lowercase_Letter 2 5.4%
 
ValueCountFrequency (%) 
Latin 27 73.0%
 
Common 10 27.0%
 
ValueCountFrequency (%) 
ASCII 37 100.0%
 

MUNIC_MOV
Real number (ℝ≥0)

Distinct count135
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean292565.35914822744
Minimum290070
Maximum293340
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum290070
5-th percentile290839.5
Q1292740
median292740
Q3292740
95-th percentile292740
Maximum293340
Range3270
Interquartile range (IQR)0

Descriptive statistics

Standard deviation543.5335622
Coefficient of variation (CV)0.001857819271
Kurtosis6.672429328
Mean292565.3591
Median Absolute Deviation (MAD)331.8710398
Skewness-2.764663583
Sum4.918608818e+10
Variance295428.7332
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[290070. 290085. 290560. 290605. 290645. ... 293060. 293095. 293315. 293325. 293340.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
292740 138234 82.2%
 
293070 6110 3.6%
 
291610 5757 3.4%
 
290570 5304 3.2%
 
291920 3280 2.0%
 
290650 3003 1.8%
 
292100 1471 0.9%
 
291005 1335 0.8%
 
292950 1081 0.6%
 
291992 1007 0.6%
 
Other values (125) 1538 0.9%
 
ValueCountFrequency (%) 
290070 18 < 0.1%
 
290100 1 < 0.1%
 
290110 1 < 0.1%
 
290130 3 < 0.1%
 
290160 10 < 0.1%
 
ValueCountFrequency (%) 
293340 1 < 0.1%
 
293330 10 < 0.1%
 
293320 84 < 0.1%
 
293310 1 < 0.1%
 
293300 5 < 0.1%
 

DIAS_PERM
Real number (ℝ≥0)

ZEROS
Distinct count135
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.085789911967643
Minimum0
Maximum347
Zeros2756
Zeros (%)1.6%
Memory size1.3 MiB

Quantile statistics

Minimum0
5-th percentile1
Q13
median5
Q310
95-th percentile31
Maximum347
Range347
Interquartile range (IQR)7

Descriptive statistics

Standard deviation11.79771026
Coefficient of variation (CV)1.298479315
Kurtosis29.31916807
Mean9.085789912
Median Absolute Deviation (MAD)7.507054719
Skewness3.712986324
Sum1527503
Variance139.1859674
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 97.5 99.5 109.5 138.5 347. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2 25475 15.2%
 
3 19525 11.6%
 
4 16805 10.0%
 
1 13402 8.0%
 
5 12840 7.6%
 
6 10475 6.2%
 
7 9399 5.6%
 
8 7384 4.4%
 
9 4984 3.0%
 
10 4355 2.6%
 
Other values (125) 43476 25.9%
 
ValueCountFrequency (%) 
0 2756 1.6%
 
1 13402 8.0%
 
2 25475 15.2%
 
3 19525 11.6%
 
4 16805 10.0%
 
ValueCountFrequency (%) 
347 1 < 0.1%
 
338 1 < 0.1%
 
309 1 < 0.1%
 
308 1 < 0.1%
 
304 1 < 0.1%
 

MORTE
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
0
152383
1
 
15737
ValueCountFrequency (%) 
0 152383 90.6%
 
1 15737 9.4%
 

NACIONAL
Real number (ℝ≥0)

SKEWED
Distinct count31
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.078943611705924
Minimum10
Maximum339
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum10
5-th percentile10
Q110
median10
Q310
95-th percentile10
Maximum339
Range329
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.884344283
Coefficient of variation (CV)0.2861752575
Kurtosis6956.403672
Mean10.07894361
Median Absolute Deviation (MAD)0.1576486834
Skewness72.1097067
Sum1694472
Variance8.319441945
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 10. 15. 34.5 39.5 44.5 47.5 109.5 339. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
10 167866 99.8%
 
45 132 0.1%
 
35 16 < 0.1%
 
81 15 < 0.1%
 
71 14 < 0.1%
 
39 12 < 0.1%
 
37 7 < 0.1%
 
100 6 < 0.1%
 
21 5 < 0.1%
 
110 5 < 0.1%
 
Other values (21) 42 < 0.1%
 
ValueCountFrequency (%) 
10 167866 99.8%
 
20 3 < 0.1%
 
21 5 < 0.1%
 
30 2 < 0.1%
 
32 1 < 0.1%
 
ValueCountFrequency (%) 
339 2 < 0.1%
 
333 4 < 0.1%
 
264 1 < 0.1%
 
260 1 < 0.1%
 
210 1 < 0.1%
 

INSTRU
Boolean

CONSTANT
REJECTED
Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
0
168120
ValueCountFrequency (%) 
0 168120 100.0%
 

INSC_PN
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.3 MiB
0
168119
2
 
1
ValueCountFrequency (%) 
0 168119 > 99.9%
 
2 1 < 0.1%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

CNES
Real number (ℝ≥0)

Distinct count200
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1748946.4465441352
Minimum3778
Maximum9443665
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum3778
5-th percentile3816
Q14065
median4294
Q32802104
95-th percentile6595197
Maximum9443665
Range9439887
Interquartile range (IQR)2798039

Descriptive statistics

Standard deviation2306781.26
Coefficient of variation (CV)1.318954771
Kurtosis0.8510470548
Mean1748946.447
Median Absolute Deviation (MAD)1905419.7
Skewness1.274228515
Sum2.940328766e+11
Variance5.32123978e+12
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[3.7780000e+03 3.7820000e+03 3.7900000e+03 3.8010000e+03 3.8120000e+03 ... 7.1740220e+06 7.2052555e+06 7.2233155e+06 9.4134815e+06 9.4436650e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2802104 23464 14.0%
 
4065 20348 12.1%
 
6595197 13958 8.3%
 
4278 11069 6.6%
 
3980 9980 5.9%
 
3816 9762 5.8%
 
3859 9570 5.7%
 
2532387 6110 3.6%
 
4073 6100 3.6%
 
2602083 5757 3.4%
 
Other values (190) 52002 30.9%
 
ValueCountFrequency (%) 
3778 656 0.4%
 
3786 1565 0.9%
 
3794 147 0.1%
 
3808 1709 1.0%
 
3816 9762 5.8%
 
ValueCountFrequency (%) 
9443665 1767 1.1%
 
9383298 2 < 0.1%
 
7223676 5 < 0.1%
 
7222955 497 0.3%
 
7187556 239 0.1%
 

RACA_COR
Real number (ℝ≥0)

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70.22539257673091
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Memory size1.3 MiB

Quantile statistics

Minimum1
5-th percentile2
Q13
median99
Q399
95-th percentile99
Maximum99
Range98
Interquartile range (IQR)96

Descriptive statistics

Standard deviation44.06951797
Coefficient of variation (CV)0.6275439176
Kurtosis-1.227562598
Mean70.22539258
Median Absolute Deviation (MAD)40.34709319
Skewness-0.8787333434
Sum11806293
Variance1942.122414
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 2.5 3.5 4.5 52. 99. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
99 117867 70.1%
 
3 38179 22.7%
 
2 7593 4.5%
 
1 3398 2.0%
 
4 1076 0.6%
 
5 7 < 0.1%
 
ValueCountFrequency (%) 
1 3398 2.0%
 
2 7593 4.5%
 
3 38179 22.7%
 
4 1076 0.6%
 
5 7 < 0.1%
 
ValueCountFrequency (%) 
99 117867 70.1%
 
5 7 < 0.1%
 
4 1076 0.6%
 
3 38179 22.7%
 
2 7593 4.5%
 

ETNIA
Categorical

MISSING
Distinct count2
Unique (%)< 0.1%
Missing36659
Missing (%)21.8%
Memory size1.3 MiB
0
131460
7
 
1
ValueCountFrequency (%) 
0 131460 78.2%
 
7 1 < 0.1%
 
(Missing) 36659 21.8%
 

Length

Max length3
Mean length3
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 2 40.0%
 
Decimal_Number 2 40.0%
 
Other_Punctuation 1 20.0%
 
ValueCountFrequency (%) 
Common 3 60.0%
 
Latin 2 40.0%
 
ValueCountFrequency (%) 
ASCII 5 100.0%
 

DIAGSEC1
Categorical

HIGH CARDINALITY
MISSING
Distinct count600
Unique (%)9.3%
Missing161680
Missing (%)96.2%
Memory size1.3 MiB
J960
1117
Y86
1030
A419
 
481
J159
 
378
J969
 
286
Other values (595)
3148
ValueCountFrequency (%) 
J960 1117 0.7%
 
Y86 1030 0.6%
 
A419 481 0.3%
 
J159 378 0.2%
 
J969 286 0.2%
 
R579 223 0.1%
 
J342 178 0.1%
 
J189 167 0.1%
 
J158 108 0.1%
 
N179 86 0.1%
 
Other values (590) 2386 1.4%
 
(Missing) 161680 96.2%
 

Length

Max length4
Mean length3.030145134
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 24 66.7%
 
Decimal_Number 10 27.8%
 
Lowercase_Letter 2 5.6%
 
ValueCountFrequency (%) 
Latin 26 72.2%
 
Common 10 27.8%
 
ValueCountFrequency (%) 
ASCII 36 100.0%
 

DIAGSEC2
Categorical

HIGH CARDINALITY
MISSING
Distinct count112
Unique (%)38.2%
Missing167827
Missing (%)99.8%
Memory size1.3 MiB
J960
56
I10
 
18
J159
 
16
A419
 
12
J969
 
12
Other values (107)
179
ValueCountFrequency (%) 
J960 56 < 0.1%
 
I10 18 < 0.1%
 
J159 16 < 0.1%
 
A419 12 < 0.1%
 
J969 12 < 0.1%
 
J189 11 < 0.1%
 
R579 10 < 0.1%
 
N179 7 < 0.1%
 
J348 5 < 0.1%
 
J342 5 < 0.1%
 
Other values (102) 141 0.1%
 
(Missing) 167827 99.8%
 

Length

Max length4
Mean length3.001498929
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 18 60.0%
 
Decimal_Number 10 33.3%
 
Lowercase_Letter 2 6.7%
 
ValueCountFrequency (%) 
Latin 20 66.7%
 
Common 10 33.3%
 
ValueCountFrequency (%) 
ASCII 30 100.0%
 

DIAGSEC3
Categorical

MISSING
Distinct count33
Unique (%)48.5%
Missing168052
Missing (%)> 99.9%
Memory size1.3 MiB
J960
15
J159
 
6
N179
 
5
A419
 
5
R579
 
4
Other values (28)
33
ValueCountFrequency (%) 
J960 15 < 0.1%
 
J159 6 < 0.1%
 
N179 5 < 0.1%
 
A419 5 < 0.1%
 
R579 4 < 0.1%
 
R578 3 < 0.1%
 
J189 3 < 0.1%
 
R570 2 < 0.1%
 
I509 1 < 0.1%
 
A418 1 < 0.1%
 
Other values (23) 23 < 0.1%
 
(Missing) 168052 > 99.9%
 

Length

Max length4
Mean length3.000392577
Min length3
ValueCountFrequency (%) 
Uppercase_Letter 12 50.0%
 
Decimal_Number 10 41.7%
 
Lowercase_Letter 2 8.3%
 
ValueCountFrequency (%) 
Latin 14 58.3%
 
Common 10 41.7%
 
ValueCountFrequency (%) 
ASCII 24 100.0%
 

DIAGSEC4
Categorical

MISSING
Distinct count5
Unique (%)71.4%
Missing168113
Missing (%)> 99.9%
Memory size1.3 MiB
R579
3
G736
1
J960
1
E877
1
A419
1
ValueCountFrequency (%) 
R579 3 < 0.1%
 
G736 1 < 0.1%
 
J960 1 < 0.1%
 
E877 1 < 0.1%
 
A419 1 < 0.1%
 
(Missing) 168113 > 99.9%
 

Length

Max length4
Mean length3.000041637
Min length3
ValueCountFrequency (%) 
Decimal_Number 9 56.2%
 
Uppercase_Letter 5 31.2%
 
Lowercase_Letter 2 12.5%
 
ValueCountFrequency (%) 
Common 9 56.2%
 
Latin 7 43.8%
 
ValueCountFrequency (%) 
ASCII 16 100.0%
 

DIAGSEC5
Categorical

MISSING
Distinct count1
Unique (%)100.0%
Missing168119
Missing (%)> 99.9%
Memory size1.3 MiB
B961
1
ValueCountFrequency (%) 
B961 1 < 0.1%
 
(Missing) 168119 > 99.9%
 

Length

Max length4
Mean length3.000005948
Min length3
ValueCountFrequency (%) 
Decimal_Number 3 50.0%
 
Lowercase_Letter 2 33.3%
 
Uppercase_Letter 1 16.7%
 
ValueCountFrequency (%) 
Common 3 50.0%
 
Latin 3 50.0%
 
ValueCountFrequency (%) 
ASCII 6 100.0%
 

DIAGSEC6
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing168120
Missing (%)100.0%
Memory size1.3 MiB

DIAGSEC7
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing168120
Missing (%)100.0%
Memory size1.3 MiB

DIAGSEC8
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing168120
Missing (%)100.0%
Memory size1.3 MiB

DIAGSEC9
Unsupported

MISSING
REJECTED
UNSUPPORTED
Missing168120
Missing (%)100.0%
Memory size1.3 MiB

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

CGC_HOSPCEPMUNIC_RESNASCSEXOUTI_MES_TOUTI_INT_TOVAL_SHVAL_SPDT_INTERDIAG_PRINCDIAG_SECUNMUNIC_MOVDIAS_PERMMORTENACIONALINSTRUINSC_PNCNESRACA_CORETNIADIAGSEC1DIAGSEC2DIAGSEC3DIAGSEC4DIAGSEC5DIAGSEC6DIAGSEC7DIAGSEC8DIAGSEC9
0NaN438051402906501932-12-22300536.0774.622008-01-14J189NaN290650701000238758199NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1NaN438051402906501980-08-30100577.9774.622008-01-12J188NaN290650801000238758199NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2NaN403530002927402006-08-23100436.2524.492008-01-05J459NaN29078030100025325223NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
3NaN438051402906501968-10-23300544.1233.182007-12-25J81NaN290650401000238758199NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4NaN438051402906502008-01-12300504.0774.622008-01-13J188NaN290650301000238758199NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
5NaN438051402906501937-04-24100497.6524.492008-01-12J448NaN290650401000238758199NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
6NaN437000002930702002-10-16300412.2524.492008-01-06J459NaN29307010100025323873NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
7NaN437000002930701995-12-06100429.9728.002008-01-03J960NaN29307001100025323873NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
8NaN437000002930702003-02-06100412.2524.492008-01-03J459NaN29307020100025323873NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
9NaN437000002930701956-06-04100480.0774.622008-01-17J152NaN29307070100025323873NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN

Last rows

CGC_HOSPCEPMUNIC_RESNASCSEXOUTI_MES_TOUTI_INT_TOVAL_SHVAL_SPDT_INTERDIAG_PRINCDIAG_SECUNMUNIC_MOVDIAS_PERMMORTENACIONALINSTRUINSC_PNCNESRACA_CORETNIADIAGSEC1DIAGSEC2DIAGSEC3DIAGSEC4DIAGSEC5DIAGSEC6DIAGSEC7DIAGSEC8DIAGSEC9
168110NaN437000002930702019-02-03300504.0778.352020-01-08J1800000293070401000253238730.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
168111NaN437000002930702019-12-13300160.6226.512020-01-10J2190000293070201000253238730.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
168112NaN437000002930702014-03-13300552.6489.962020-01-08J18000002930701101000253238730.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
168113NaN437000002930702014-05-21100504.0778.352020-01-10J1800000293070601000253238730.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
168114NaN437000002930701984-08-19300275.8640.962019-12-25J03000002930702010002532387990.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
168115NaN437000002930702017-03-12100453.4825.712020-01-16J4590000293070501000253238730.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
168116NaN437000002930702018-11-08300504.0778.352020-01-17J1800000293070401000253238730.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
168117NaN437000002930702019-08-25100160.6226.512020-01-17J2190000293070601000253238730.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
168118NaN437000002930701955-08-21100504.0778.352020-01-05J1800000293070301000253238730.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
168119NaN437000002930701944-04-22100504.0778.352020-01-04J1800000293070501000253238730.0NaNNaNNaNNaNNaNNaNNaNNaNNaN